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Abstract : This note investigates how various ideas of "expectedness" can be captured in the framework of 
possibility theory. Particularly, we are interested in trying to introduce estimates of the kind of lack of surprise 
expressed by people when saying ”1 would not be surprised that...” before an event takes place, or by saying ”1 knew 
it" after its realization. In possibility theory, a possibility distribution is supposed to model the relative levels of 
possibility of mutually exclusive alternatives in a set, or equivalently, the alternatives are assumed to be rank-ordered 
according to their level of possibility to take place. Four basic set-functions associated with a possibility 
distribution, including standard possibility and necessity measures, are discussed from the point of view of what they 
estimate when applied to potential events. Extensions of these estimates based on the notions of Q-projection or 
OWA operators are proposed when only significant parts of the possibility distribution are retained in the evaluation. 
The case of partially-known possibility distributions is also considered. Some potential applications are outlined 


1 • Introduction 

In case of incomplete knowledge, people facing a query like "is A (going to be) true ?”, may answer it with a great 
variety of ways such that "I don't know", "it is not impossible", "it is quite possible", "I would not be surprised that 
A is true”, "I am quite certain that A is true", etc., according to the actual state of knowledge and the query. 
Possibility theory (Zadeh, 1978) offers a framework for modelling uncertain or vague information by means of a 
possibility distribution. Such a distribution assesses the level of possibility of each possible value of a considered 
(single-valued) variable x, i.e. the elements of the domain of the variable x are rank-ordered according to their relative 
possibility on the scale [0,1]. Then a possibility measure n is associated with the distribution, and 11(A) estimates 
the consistency of the available knowledge with the statement "A is true" (short for "x is in A is true"). A dual 
measure of necessity N estimates the certainty of A as the impossibility of "non A”, namely N(A) = Impos(A) = 
1 - 11(A). Then N(non A), the certainty that A is false, can be interpreted as a degree of surprise S(A) = N(non A) = 
Impos(A) that A is true. This corresponds exactly to the view developed by the English economist Shackle (1961) 
who worked out a non-probabilistic model of expectation, before the introduction of possibility theory. However this 
notion of surprise where 11(A) = 1 - S(A) does not seem to correspond exactly to the intended meaning of a sentence 
such that "I would not be surprised that A is true", which rather expresses that "A true" is more than just possible 
(even with a high degree), and is not far to be somewhat certain; what is stated is a very strong kind of possibility. 

In this note we investigate what estimates can be defined from a possibility distribution-based knowledge 
representation, in order to evaluate, in various ways, how much an event, such that "x is in A", is expected to be 
true. The next section introduces four basic set functions defined from a possibility distribution, which are then 
extended using the notions of OWA operators, or of Q-projection, and also in the case of partially-defined possibility 
distributions. 


2 - The Four Basic Set Functions in Possibility Theory 

Let U be the domain of a single-valued variable x. In this note, U is supposed to be finite for simplicity. A 
possibility distribution rc x on U is a function from U to [0,1] which constrains the possible values of x according to 
the available information ; rc x (u) = 0 means that x = u is definitely impossible while n x (u) = 1 means that 
absolutely nothing prevents that x = u. A possibility distribution n x is said to be normalized iff 3 uq e U, 
7t x ( u o) = 1 , i.e. at least one value of x in U is completely possible, which is natural if U is an exhaustive domain 
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for x. ;t x can be viewed as a simple way of encoding a preference relation among the possible values of the variable 
x; the smaller rt x (u), the more unexpected x = u (or the less feasible x = u). It is assumed in the following that n x is 
normalized. 

Given a possibility distribution n x and an event A, four basic estimates can be imagined which are in agreement 
with the ordinal nature of Jt x ; namely 


the possibility measure (Zadeh, 1978) 


n x (A) = max ue A rt x (u) ; 

(1) 

the guaranteed possibility (Dubois and Prade, 1992) 


A X (A) = min U6 A 7t x (u) ; 

(2) 


and the similar evaluations for "non A", denoted A, whose complements to 1 are taken in order to define meaningful 
quantities for A (n x should be normalized), namely 

- the necessity measure 

N X (A) = min ug A (1 - Jt x (u)) = 1 - n x ( A) ; (3) 

- the unguaranteed necessity 

V X (A) = max ug A (1 - rr x (u)) = 1 - A x ( A). (4) 

n x (A) estimates to what extent there exists a value u in A which is possible for x, i.e. the consistency of the 
proposition "x is in A” with what is not unexpected according to the available information. 

A X (A) estimates to what extent all the values in A are actually possible for x according to what is known; any value 
in A is at least possible for x at the degree A X (A); so A X (A) expresses a guaranteed possibility since it is a 
minimum level over A. 

N X (A) estimates to what extent all the values in A are impossible for x, or equivalently to what extent the value of x 
is necessarily in A; any value in A is at most possible for x at the degree 1 - N X (A). 

V X (A) estimates to what extent there exists a value u in A which is impossible for x. It is a measure of unguaranteed 
necessity in favor of A since we check the impossibility for x of only one value in A, and not the impossibility 
of all. 

Clearly 

A X (A) < n x (A) (5) 

N X (A) < V X (A). (6) 

Provided that rt x is normalized, and that 3 u e U, n x (u) = 0 (at the technical level, it is always possible to add an 
extra-element to U, if necessary, in order to satisfy this requirement), we have the stronger inequality (Dubois and 
Prade, 1992) 

max(N x (A)A x (A)) < min(TI x (A),V x (A)). (7) 

Thus A x corresponds to a very strong possibility and V x to a very weak necessity. Noticeably, N x and A x are 
completely unrelated, as well as n x and V x . When estimating the tendency of A to contain the true value of x, we 
have indeed two complementary points of view, the extent to which values in A are effectively possible, and the 
extent to which values out of A are impossible. These two complementary evaluations may contribute to estimate 
our lack of surprise to have A true. 

The four measures enjoy the following characteristic properties (the subscript x is omitted in the following) 


n(A uB) = max (TI(A), 11(B)) ; 

(8) 

A(A uB) = min(A(A),A(B)) ; 

(9) 

N(A n B) = min(N(A),N(B)) ; 

(10) 

V(A n B) = max(V(A),V(B)). 

(ID 


Thus FI and N are monotonically increasing with respect to set inclusion, while A and V are decreasing. 
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The interval [A(A),ri(A)] characterizes the amplitude of the variation of the levels of possibility among the values in 
A, the interval [N(A),V(A)] = 1 - [A(A),n(A)] the amplitude of the variations in A. We can then symbolically write 
[N(A),V(A)] = [N,V](A) and [A(A), 11(A)] = [A,n](A) ; then we have 

fN,V](A) = 1 - [A,ni(A) (12) 

and (8H9H1QH1 1) become 

[A,n](A uB) = mM([A,ri](A),[AJl](B» (13) 

[N,V](A n B) = mM([N,V](A),[N,V](B)) (14) 

with mM([a,b],[c,d]) = [min(a,c), max(b,d)], and then 1 - mM([a,b),[c,d]) = mM(l - [a,b], 1 - [c,d]). Note that 
mM([a,b],[c,d]) = convex_hull([a,b] u [c,d]). 

We now discuss what measures of expectedness and surprise are, and we introduce generalizations of the set functions 
IT, A, N, V based on the notions of OWA operators, or of Q-projection, in order to build intermediary estimates 
which may be used as estimates of how much an event A is expected to be true. 


3 • Measures of Expectedness and Surprise 

In this section we shall introduce some formal mechanism for capturing the concepts of "expectedness" and 
"surprise" associated with a set, based upon the assumption of some possibility distribution. 

Assume we are concerned about John's height. Then a possibility distribution would be induced by the knowledge 
that John is "tall". In this situation if it was found that John's height is six-feet seven inches one would not be 
surprised and would even have expected an answer like that We shall in the following suggest some formal methods 
for capturing a measure of these concepts. 

Assume we have a variable x which induces a possibility distribution rt x on U. Let A be a crisp subset of U. We 
shall let Exp(A) measure the degree of expectedness of A based upon n. We shall define this measure as the truth of 
the proposition 

"most of the elements not in A are not possible". 

We can more formally express this partial inclusion of A into the fuzzy set of values of U which are rather 
impossible, as 

Exp(A) = mostug A H - Jt(u)] 

where 'most u ' refers to the proportion of elements in A whose degree of possibility should be low. In this section 
and in the next one, we shall propose two slightly different ways of precisely defining this formal expression, either 
using OWA operators or Q-projections. Let us first consider a special extreme case of 'most' : "all”. In this case 

Exp(A) = min ue ^ [1 - 7t(u)]. 

Thus this extreme definition becomes what we previously called the necessity measure. Thus the extreme of 
expectedness is necessity. 

In order to evaluate expectedness in the general case, we can use the concept of OWA operators introduced by Yager 
(1988), i.e. 

Exp(A) = 0WA u6 a[ 1- 7i(u)]. 

Let A = {uj, .... u r ). Let a; = 1 - 7t(uj). Let (coj co r ) be a set of weights such that 

1) Vi, coj e [0,1] ; 

2) Xj (Oj = 1 ; 

then OWA(aj, ..., a r ) = X; 0)j • b,, where bj is the ith largest of the aj. Two extreme cases of weights are worth 
noting. Taken w r = 1 (and then all others are zero), we get : 
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OWA(aj a r ) = bj- = min; aj = min ue A (1 - rc(u)], 

i.e. the necessity measure of A. When coj = 1 (and then all others are zero), we get 

OWA(ai aj-) = bi = maxj aj = max ue A H - n(u)], 

which is what we previously called the unguaranteed necessity. When coj = 1/k for i = IJc with k < r, we compute 
the average of the k largest levels of impossibility aj = 1 - rr(uj). Following Yager (1988)’s discussion, we can 
express "most" by an appropriate selection of weights. 

We need now introduce a formal definition for "surprise of A" given a possibility distribution. We denote Sur(A) as 
the measure of surprise and define it as the truth of the proposition 

"most of the elements in A are not possible". 

We can more formally express this as 

Sur(A) = most ug a (1 - 7t(u)). 

As we can see we have Sur(A) = Exp(A), which expresses that A is surprising if non-A is expected. However we do 
not have Sur(A) = 1 - Exp(A) in the same time (i.e. "A is surprising" is different from ”A is unexpected" in our 
model). Clearly these two understandings of Sur(A) would be equivalent in a probabilistic model. Again considering 
the special case where "most” is replaced by "all" we get 

Sur(A) = min u€ a H - n(u)]. 

This special case can be further simplified so that 

Sur(A) = 1 - max u€ a rc(u) = 1 - 11(A) 

which corresponds to Shackle (1961)'s definition. 

Considering the more general case of surprise (with "most" in place of "all"), we can use OWA operators to 
implement the formal expression by appropriate selection of the weights. At the extreme when to s = 1 , w’ith A = 
{u r+ i, .... u s ), we get Sur(A) = min u (1 - n(u)) = 1 - 11(A), while when co r+ ] = 1 

Sur(A) = max u (1 - rc(u)) = 1 - min U6 a n(u) = 1 - A(A). 

We can further observe that if one considers the negation of "most" as "at_least_a_few", then 

Sur(A) = 1 - at_least_a_few u6 a [n(u)] 

where at_least_a_few corresponds to an ordered weighted average OWA’ related to the one defining Sur(A) = 
OWA U€ a [1 - rt(u)J in the following way. Sur(A) = OWA(l - 7t(u r+ ]), ..., 1 - n(u s )) = Xj coj • b'j where b'j is the 

ith largest of the 1 - rt(uj). Then Sur(A) = Xj 0 )j - Xj Ct>j(l - b'j) = 1 - OWA’(7t(u r+ ]) 7i(u $ )) = 

1 - Xj coj • Cj where Cj is the ith smallest of the n(uj). 

Remark : Extension to belief structures 

We shall here briefly suggest the extension of the preceeding ideas to the case in which our basic knowledge is a 
belief structure of the type introduced by Shafer (1976). Assume we have a belief structure consisting of the focal 
elements B j, .... B n with weights m(Bj) (and Xj m(Bj) = 1). We can define the degree of amazement associated with 
the subset A as 

Amaze(A) = Xj Sur(A I Bj) • m(Bj) 

where 

Sur(A I Bj) = most ue A U - 4Bj( u )] 
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and hb . is the characteristic function of Bj. We can define the degree of anticipation associated with A as 


Anticipate(A) = £j Exp(A I Bj) • m(Bj) 

where 

Exp(A I Bj) = mostug A U - HBjM- 


4 - Generalizations Based on Q-Projection 

As established in the preceeding sections, we have shown that II, A (resp. : N, V) are closely related to the concepts 
of surprise and expectedness. These concepts actually being related to the extremes of these measures. Crucial to the 
determination of die measures of surprise and expectedness are evaluations based upon quantifiers such as "most" 
lying being the extremes "for all" and "there exists" (corresponding to min and max operations). In the previous 
section we have suggested the use of OWA operators to implement these soft quantifiers. In this section we shall 
suggest an alternative approach to the kinds of evaluations necessary. This approach is based upon the notion of Q- 
projection (Yager, 1985). We only consider the case of non-fuzzy quantifiers where Q is a quantifier of the type "at 
least r/k" for simplicity. We define each Q-projection in terms of a median operator, which has some notation 
advantage for expressing Q-projection. In the following we shall first define the concept of Q-possibility of A. In the 
ordinary measure of possibility we have Q = "at least one" (thus A is possible if just one element in A is possible), 
while for Q-possibility measure of the type discussed here, we have Q = "at least r/k", where k is the cardinality of 
the set A. In this more general setting we are saying that A is Q-possible if at least r/k of the elements in A are 
possible. We further note that if we define "most" by the appropriate selection of some value r as explained below, 
we have 

Sur(A) = 1 - Q-possibility(A). 

Let A = {uj, .... ujJ be the finite subset of U on which we want to estimate to what extent a given number (or a 
given proportion) of values of A are possible. This number or proportion can be translated into a k-tuple of the form 
Q = (1, .... 1, 0, .... 0) where k = IA1, and where the number of T in the tuple representing Q is r. Then the Q- 
possibility of A, denoted by Q(A), is defined by 

Q(A) = median({rc(Uj), .... 7C(u k )) uQu{l)) (15) 

where Q denotes the complement of Q. Indeed, Q(A) is obtained as the median of a set of 2k + 1 elements made of k 
- r + 1 elements equal to T, of the k values rc(uj), .... Tt(uj-), and of r values equal to 0. Thus, Q(A) is equal to the 
(k + l)th value when the 2k + 1 elements are ranked in decreasing order, i.e. the rth value in the set {rr(uj), .... 

7t(uj.)}, once these degrees are decreasingly ordered. Clearly Q = (1, 0 0) (with (k - 1) ’O') gives back Q(A) = 

11(A), while Q = (l,l,...,l) (with k T 1 ) yields Q(A) = A(A). Clearly, in any case 

Q(A) € [AJIKA). (16) 

It can be shown (see Prade (1990) for instance) that the Q-possibility of A is nothing but the possibility measure 

that the number of possible elements (according to n) is at least r, computed from the possibility distribution 

representing the more or less possible values of the cardinality of the fuzzy subset of A made of the elements which 
are rather possible. 

By duality, quantities of the form 1 - Q'(A) can be introduced. We have 

1 -Q'(A) = 1 -medianffnfu'j), .... rtfu’jj-)) u Q' u []}) 

= median( { 1 - rr(u'i) 1 - nfu'^)) uQ'u(0)) (17) 

where A = {u'i, u ’2 u'jj'), k' = IAI, and Q' is a k'-tuple of T and 'O'. When Q' = (1, 0 0), we recover N(A) = 

1 - Q'(A), and when Q' = (1, 1, .... 1), we get V(A) = 1 - Q'(A). When "most" of the values in A are highly 
possible, or when only few values outside A are possible (i.e. equivalently, "most" of the values in A are 
impossible), which can be estimated using respectively Q(A) (with r "close" to k), and 1 - Q'(A) where Q' models 
"few" (the number of T in Q' is small), we may consider that this is the kind of situation where we would expect 
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that A is true. Unfortunately, Q does not enjoy a decomposability property with respect to the union of subsets in 
the general case, as II and A do. 

We may conclude that A should be true either by checking, on a completely known possibility distribution, that (at 
least) most values outside A are impossible for instance, or from the computation of approximations of [AJIKA) 
and [N,V](A) on the basis of a partially-known possibility distribution, as explained below. 


5 - Estimations Based on Partially-Known Possibility Distributions 

By a partially known possibility distribution, we mean that for each element u of U, the degree of possibility Jt x (u) 
is only known to belong to an interval [jt“ x (u), rt + x (u)]. The upper bound rt + x is normalized on U since 7t x is 
supposed to be normalized, while it~ x is not necessarily normalized. 

Then, the following bounds can be computed 


n + (A) = max ue A 7t + x (u) S 11(A) > max ue A nr x (u) = II (A) (18) 

N+(A) = min ug A (1 - rt+ x (u)) < N(A) < min ug A (1 - ir x (u)) = N~(A) (19) 

A + (A) = min u6 A rt + x (u) > A(A) > min U6 A ir x (u) = A“(A) (20) 

V+(A) = max u g A (1 - n+ x (u)) < V(A) < max uj? A (1 - ir x (u)) = V"(A). (21) 

In other words we have inner and outer approximations of [Aj1] and [N,V], namely 

VA, [A+ n-](A) C [AJIKA) C [A",n + ](A) (22) 

VA, [N-.V+KA) C IN.V](A) C IN+.V-KA) (23) 


However [A + ,n~](A) may be empty if it happens that A + (A) > n - (A), as well as [N - ,V + ](A) if N~(A) > 
V + (A). A particular case which is worth considering is when 3 V c U, V u e V, jf x (u) = n + x (u) = rr x (u) and V u 
€ U - V, rr“ x (u) = 0, 7t + x (u) = 1, i.e. rt x is perfectly known on a part of U and completely unknown elsewhere. 
Then the lower bound of N(A) 


N-(A) = min ue AnV 0 - ^x( u » = N(A uV)> N(A) (24) 

(while N + (A) = 0 as soon as V ^ A) is a good candidate for estimating a beginning of certainty in favor of A. 
Indeed, N~(A) = N(A u V) corresponds to the certainty in favor of a set less specific than A, but which contains A. 
Note that N(A) > N(A n V) = min(N(A u V), N(V)) where N(V) is totally unknown, since n x is only supposed to 
be known on V. Then N"(A) = N(A u V) is a good approximation of the certainty of A with respect to the available 
information. Moreover if, together with N“(A) > 0, A + (A) = min ug An y n x (u) = A(A n V) < I1"(A) = 
max ue A nV = II(A n V) < 11(A) is large enough, we would not be surprised that A turns to be true. 


6 • Potential Applications 

Although this note is basically oriented towards the formalization of the concepts of expectedness and surprise in the 
framework of possibility theory, let us briefly outline some potential applications. 

A first use we may think of is the representation of decision rules of the kind "if A is expected then do...” which is a 
soft and more realistic version of the rule "if A is certain then do...". 

Another use might be in information systems where we want to rank the items according to what extent they can be 
expected to satisfy the request. This might be of interest particularly if the set of items which more or less certainly 
satisfy the request is empty and the set of items which satisfy it only possibly is too large. 
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Clearly, it is not only important to be able to represent incomplete, uncertain, vague states of knowledge, but also 
to understand what a model offers for modelling expectation. This may be important in knowledge-based systems for 
instance for representing the state of knowledge of the user and deciding what explanation has to be provided to 
him/her (usually what is expected has not to be explained and what is surprising has to be explained). 

Another issue that our future work in this area will focus on is possibility distribution generation based upon 
surprise and expectedness qualification. Consider a proposition like "I expect (at degree a) that John will be late". 
This proposition can be seen to induce a possibility distributions n, over John's arrival time. In particular we see 
that this requires the solution of an equation of the type a = Exp[A / «]. In a similar fashion propositions like "I 
would be surprised (at the degree a) if John is early" or "I would not be surprised (at the degree P) if John is late” can 
be seen to induce possibility distributions. The ability to generate possibility distributions from propositions of the 
above type would provide an interesting tool in knowledge representation. 


7 • Concluding Remarks 

In this note we have investigated all the estimates which can be attached to a non-fuzzy event A, when the available 
knowledge is modelled by a possibility distribution (even if this distribution is partially specified). The role of four 
basic measures has been emphasized, two of them define an interval related to the estimation of the idea of 
possibility, while the two others define another interval related to the idea of necessity or certainty. The characteristic 
properties of these intervals have been laid bare. Other quantities, which generalize the previous ones in various 
ways, have been introduced. The appropriateness of these different degrees for estimating how much an event can be 
expected to be true, how much its occurrence is not surprising, has been discussed. All these measures could be 
extended to fuzzy events A. 
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